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Abstract —  Consider  the  problem  of  value  iteration  for 
solving  Markov  stochastic  games.  One  simply  iterates 
backwards,  via  a  Jacobi-like  procedure.  The  convergence 
of  the  Gauss-Seidel  form  of  this  procedure  is  shown  for 
both  the  discounted  and  ergodic  cost  problems,  under  ap¬ 
propriate  conditions,  with  extensions  to  problems  where 
one  stops  when  a  boundary  is  hit  or  if  any  one  of  the  play¬ 
ers  chooses  to  stop,  with  associated  costs.  Generally,  the 
Gauss-Seidel  procedure  accelerates  convergence. 

Key  words.  Stochastic  games,  Markov  games,  Gauss-Seidel  proce¬ 
dure,  numerical  algorithms 

I.  Introduction 

We  consider  two-player,  zero-sum,  finite-state,  Markov 
stochastic  games.  There  are  N  states  and,  unless  noted 
otherwise,  we  suppose  that  the  controls  are  feedback  and 
not  randomized.  In  state  i,  player  l’s  (the  minimizing 
player)  control  is  denoted  by  Kj  and  that  of  player  2  (the 
maximizing  player)  is  denoted  by  Vi.  The  convergence  of 
the  value  iteration  procedure  (see  (2.2)  below)  for  Markov 
stochastic  games  for  a  discounted  cost  function  (or  where 
there  is  an  absorbing  boundary)  was  established  in  [9], 
[13].  The  convergence  of  the  Gauss-Seidel  procedure  was 
first  established  for  the  control  problem  in  [12].  It  is 
widely  used  and  is  no  less  fast  and  is  generally  faster 
than  the  Jacobi  procedure;  see,  for  example,  [7],  [12]  and 
[11,  Chapter  6].  It  will  be  seen  that  the  Gauss-Seidel 
procedure  can  be  viewed  as  an  iteration  with  a  modified 
transition  matrix  Q.  The  references  discuss  the  nature 
of  the  transition  probability  that  Q  represents  and  show 
why  it  is  faster.  In  particular,  it  has  a  smaller  spectral 
radius  than  the  transition  matrix  of  the  original  problem. 
The  ordering  of  the  states  in  the  iteration  plays  an  impor¬ 
tant  role  in  getting  the  best  acceleration  of  convergence. 
If  there  is  an  absorbing  set,  then  it  is  best  to  order  the 
states  so  that  the  mean  flow  is  toward  that  set.  In  prac¬ 
tice,  where  there  is  no  absorbing  set,  the  ordering  is  often 
changed  from  cycle  to  cycle,  say  “reversing  direction,”  to 
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provide  a  greater  mixing,  which  also  accelerates  conver¬ 
gence.  See  [7],  [12]  and  [11,  Chapter  6]  for  a  discussion 
of  preferred  orderings,  a  point  that  we  do  not  have  the 
space  to  deal  with  here,  but  is  the  same  for  the  control 
and  the  game  problem. 

The  convergence  of  the  Gauss-Seidel  form  has  not  yet 
been  established  for  the  game  problem.  Under  appro¬ 
priate  conditions,  the  convergence  will  be  established  for 
the  discounted  and  ergodic  cost  functions,  and  for  related 
problems  such  as  where  there  is  an  absorbing  boundary 
or  optional  stopping. 

The  Ui,Vi  take  values  in  compact  sets  that  might  de¬ 
pend  on  i.  Define  the  control  vectors  u  =  {ui,i  <  N}, 
v  =  {vi,i  <  N}.  Let  {Pij{ui,Vi)\i,j  <  N}  denote  the 
transition  probabilities  under  controls  u ,  v,  and  define 
Pij(ui,Vi )  =  pPij(ui,Vi),  where  p  €  (0,1)  is  the  dis¬ 
count  factor.  Define  the  degenerate  transition  matrix 
P{u,v)  =  {ppij(ui,Vi);i,j  <  N}.  Hence  the  row  sums 
of  P(u,  v)  are  1  —  p.  The  cost  rate  when  in  state  i  and 
under  Ui,Vi  is  the  function  ki{ui,Vi).  Let  {Xn}  denote 
the  random  variables  of  chain.  Then  the  discounted  cost 
under  u,  v  is 

OO 

Ci(u,  v)  =  Ef'v  ^  Pnkx„  (uxn,vXn), 

n— 1 

where  Ef’v  denotes  the  expectation  under  u,  v  and  with 
initial  state  i.  It  is  always  supposed  that  the  Pij{ui,Vi) 
and  ki(ui ,  Vi)  are  continuous  in  the  Ui,Vi.  Define  the  vec¬ 
tor  K(u,v)  =  {ki(ui,Vi)\i  <  N}. 

In  addition,  unless  noted  otherwise,  we  assume  that 
the  Isaacs  condition  holds;  namely,  that  for  any  ./V-vector 
H  =  {hui<N}, 

sup  inf  [P(u,  v)H  +  K(u,  u)]  =  inf  sup  [ P{u ,  v)H  +  K(u ,  t>)] . 

V  U  u  V 

(1.1) 

In  vector  forms  such  as  (1.1),  it  is  always  supposed  that 
the  inf  and  sup  are  taken  line  by  line,  so  that  the  ith  line 
is  sup„.  miUi[J2jPij(ui,Vi)hi  +  ki(ui,Vi)\  and  involves  the 
inf  and  sup  over  Ui  and  only.  The  condition  (1.1)  is 
used  for  notational  simplicity.  Otherwise,  one  must  ran¬ 
domize  the  controls.  Then,  when  the  number  of  control 
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values  is  finite,  the  control  is  replaced  by  the  vector  of 
probabilities,  and  the  analog  of  (1.1)  holds.  If  the  controls 
take  values  in  a  continuum  and  (1.1)  does  not  hold,  then 
the  randomization  is  more  complicated,  but  the  results 
can  be  readily  extended.  The  condition  (1.1)  commonly 
holds  for  the  games  arising  as  numerical  approximations 
to  stochastic  differential  games  under  the  conditions  of 
[6],  [8]. 

Section  2  concerns  the  discounted  cost  problem  and 
also  remarks  on  cases  where  there  is  forced  stopping  on 
hitting  a  boundary  or  with  optional  stopping.  The  er- 
godic  cost  problem  is  dealt  with  in  Section  3  and  the  cost 
under  u,  v  is 

1  ” 

7 (u,v)  =  lim  -E^vY^kxl{uXl,Vxl)-  (1-2) 

n  n  z ' 

1=1 

In  Section  3,  it  is  first  shown  that  the  game  ver¬ 
sion  of  a  classical  value  iteration  method  converges. 
Then  this  is  adapted  to  the  Gauss-Seidel  procedure. 
If  controls  un,vn  are  used  at  time  n,  then  write 
Pi^\un,vn',un~1,vn~1\‘ •■’,u1,v1)  for  the  n-step  transi¬ 
tion  probabilities.  The  ergodic  cost  problem  uses  the  ad¬ 
ditional  assumption  that  there  is  an  e  >  0,  a  state  jo,  and 
an  integer  m  <  N,  such  that 

p^\nm,vm-,um-1,vm-1-,----,u1,v1)>e  (1.3) 

for  all  possible  controls.  This  is  a  standard  condition  for 
the  ergodic  cost  problem  in  the  control  literature  [15] .  See 
also  [1,  Vol  2]  and  [7,  ppl56-158].  Of  particular  interest 
are  Markov  chain  games  that  arise  as  numerical  approx¬ 
imations  of  games  with  diffusion  models  as  in  [6],  [8], 
where  (1,3)  will  commonly  hold  under  the  assumptions 
on  the  nondegeneracy  of  the  diffusion  in  [6] . 

To  date,  there  have  not  been  proofs  of  the  convergence 
of  the  Gauss-Seidel  method  for  either  the  game  or  the  con¬ 
trol  problem  with  ergodic  cost  criteria.  Indeed,  it  does 
not  always  converge,  even  under  (1.3)  for  ergodic  models. 
But,  it  will  converge  if  (1.3)  holds  for  a  modified  transi¬ 
tion  probability.  This  will  be  discussed  further  in  Section 
3.  The  modified  condition  holds  for  the  chains  obtained 
as  approximations  in  [6]  under  the  nondegeneracy  con¬ 
ditions  used  there.  These  chains  are  obtained  via  the 
Markov  chain  approximation  methods  of  [11].  The  book 
[4]  discusses  other  numerical  procedures,  based  on  nonlin¬ 
ear  programming  methods  and  (under  some  smoothness 
conditions)  develops  a  convergent  modified  Newton  pro¬ 
cedure  that  has  the  rate  of  the  policy  iteration  procedure 
whenever  that  converges.  The  paper  [5]  discusses  what 
might  be  called  a  type  of  combined  value  iteration  and 
approximation  in  policy  space  method.  The  paper  [3] 
numerically  compares  a  variety  of  approaches  and  shows 
that  the  policy  iteration  algorithm  performs  best  when  it 
converges  (which  is  not  always  the  case).  The  papers  [2], 


[14]  discuss  a  variety  of  modifications  of  value  and  policy 
iteration. 

II.  The  Discounted  Cost  Problem  and 
Extensions 

Until  further  notice,  we  consider  the  discounted  cost 
case.  Let  C)  denote  the  value  of  the  game  when  starting 
in  state  i  and  define  C  =  (C);  *  <  IV}.  In  vector  form,  the 
equation  for  the  value  is 

C  =  sup  inf  [P(u,  v)C  +  K(u,v)]  =  inf  sup  [P(u,  v)C  +  K(u,v)]  . 

V  u  u  v 

(2.1) 

In  all  such  vector  equations,  The  inf  sup  is  taken  by  line; 
the  *th  line  is  over  u,; ,  V{ .  Recall  that  the  discounting 
is  incorporated  into  the  P(u,v),  Hence,  for  any  integer 
m  >  1,  Pm(urn,vm-,-  —  -,u1,v1)  is  a  contraction  (in  the 
Euclidean  norm  sense)  uniformly  in  the  choices  of  the 
controls  {un,vn}.  A  unique  solution  C  exists  and  is  the 
value  [4,  Theorem  3.1.1].  Let  u,  v  denote  any  controls 
that  realize  (2.1). 

Our  aim  is  the  computation  of  C,  hence  of  optimizing 
controls  as  well.  A  variety  of  computational  methods  are 
available.  In  [9],  [10],  [13]  it  was  shown  that,  for  any  C°, 
the  Cn  in  the  iteration  in  value  space  algorithm 

Cn+1  =  sup  inf  [P(u,  v)Cn  +  K(u,  v)]  (2.2) 

V  u 

converge  to  C. 

The  Gauss-Seidel  procedure  for  the  game  problem  is 
the  iteration  in  value  space  with  successive  substitutions, 
taken  in  the  order  i  =  1,2,..., 

2—1  N 

C”+1  =  sup  inf  ^pii(ui,ui)C'"+1  +  Y,Pij(ui,vi)C?  +  kz(ui,Vi )  . 

Vi  Ui  . 

J=1  3=i 

(2.3) 

Taking  the  sup  inf  in  (2.3)  is  equivalent  to  solving  a  ma¬ 
trix  game.  Except  for  this  sup  inf,  it  is  just  the  standard 
Gauss-Seidel  method  for  iteratively  solving  linear  equa¬ 
tions.  The  convergence  proof  in  Theorem  2.1  adapts  the 
method  of  [7],  [12].  The  ordering  of  the  states  can  vary 
with  n. 

Before  proceeding,  it  is  convenient  to  define  a  transi¬ 
tion  probability  Q(u,  v )  and  cost  vector  K(u,  v)  that  play 
a  crucial  role  in  the  analysis.  This  Q  will  be  the  effec¬ 
tive  transition  probability  that  determines  the  behavior 
of  the  Gauss-Seidel  procedure.  Consider  the  set  of  linear 
equations  in  an  unknown  D  =  {D.y},  where  the  vector 
C  is  given,  solved  by  successive  substitution  in  the  order 
i  =  1,2,. ..,1V  : 

Ui,Vi)Cj  +  ki(ui,Vi)  . 


2-1  N 

D,  ^  )  Pij  ( Uj  ,Vj) D j  T  ^  ]  Pij  ( 
1=1  j=i 


(2.4) 
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This  uniquely  defines  a  matrix  Q(u,v )  =  {qij(u,  v);  i,j  < 
N}  and  vector  K{u ,  v)  =  {&,(«,  v)\i  <  N}  such  that  D  = 
Q{u,  v)C  +  K(u,  v).  In  detail,  by  successive  substitutions 
in  (2.4),  we  find  that 

qij{u,v)  =plj(u1,v1),  1  <j<N, 

q2i(u,v)  =  p2i(u2,v2)qn(u,v), 

q2j{u,v)  =  p2j{u2,v2)  +  p2i(u2,v2)q!j{u,v),  2  <  j  <  N. 

In  general, 

2-1 

Qij(%  v)  =Pij(ui,Vj)  +  y ^Pik{ui,Vj)qkj(u,v),  j  >i, 

k= 1 

2-1 

qij(u,v)  =  y Pik{ui,Vi)qkj{u,v ),  1  <  j  <  i. 

fe= l 

(2.5) 

Q(u,  v )  can  also  be  defined  from  (2.4)  in  terms  of  the 
upper  and  lower  triangular  matrices  formed  from  P(u,  v), 
but  we  prefer  to  write  the  details.  Also, 


In  (2.8)  and  (2.9),  it  is  understood  that  the  inf  and  sup 
are  again  taken  line  by  line,  in  the  order  i  =  1,2....  The 
inf  and  sup  in  line  1  are  over  u\  and  V\ ,  and  in  turn,  that 
in  line  i  are  over  Ui,v j. 

For  any  u,  v,  and  i  =  1, . . . ,  N,  (2.1)  yields 


i—  1 


N 


^  ^  l^'i  j  (.Ui ,  Vi, )  C j  T  ^  ^  Pij  (v>i ,  Vj  )  Cj  T  ki:(ili.  Vj  ) 


<  Ci  =  sup  inf 


2—1 


N 


Pij  (Ui  ’  Vi ) ' C3  +  Pij  (Ui’Vi) C0  +  ki  ( Ui >  V% ) 


1=1 


N 


—  ^  ^  Pij  (pi 5  ^2 ) Cj  -\-  ^  ^  Pij  (llj ,  Vj ) Cj  H-  Vi) 


3=  1 
2-1 


3=i 

N 


—  ^  ^  Pij  i  ) Cj  +  ^  ^  Pij  (v^i)  Vi)Cj  +  ki(Ui,Vi). 
3= 1 


3  ~2 


In  vector  notation,  this  can  be  written  as 


(2.10) 


fci(u,v)  =  fei(wi,vi), 

k2(u,v)  =P2i(u2,v2)ki{u,v)  +  k2(u2,v2), 


P(u ,  v)£7  +  !T(tZ,  v)<C  <  P(u ,  v)C  +  lT(u,  v). 
It  can  also  be  written  as 


and,  in  general, 

2-1 

ki(u,v)  =  y^pij(ui,Vi)kj(u,v)  +  ki(ui,Vi).  (2.6) 
k= 1 

Note  that,  for  our  discounted  cost  problem  where  the  dis¬ 
count  factor  is  incorporated  into  the  Pij(ui,  i q),  Q(u,  v)  is 
a  degenerate  transition  matrix  since  the  row  sums  satisfy 
qij{u,v)  <  p  for  all  i  and  controls.  If  there  is  no  dis¬ 
counting  (i.e. ,  p  =  1),  then  the  row  sums  are  always  unity. 
These  facts  are  easily  proved  by  induction,  starting  with 
i  =  1. 


Q(u,v)C  +  K(u,v)  <  C  <  Q(u,v)C  +  K(u,v).  (2.11) 
For  any  u,v,  (2.3)  or,  equivalently,  (2.9)  yields 


Q(un,v)Cn  +K(un,v) 


<  Cn+  =  sup  inf  \Q(u,  v)Cn  +  K(u,  u)  <  Q(u,vn)Cn  +  K{u,v 
v  u  L  J 

(2.12) 

Selecting  (u,v)  =  {un,vn)  in  (2.11)  and  (u,v)  =  (u,v)  in 

(2.12)  yields 


Q(un,v)  (Cn-C) 


Theorem  2.1.  For  any  C° ,  the  Cn  in  (2.3)  converges  to 
C. 


=  \Q(un,v)Cn  +  K(un,v) 
<  Cn+1  -  C 


Q(un,v)C  +  K(un,v ) 


Proof.  Since 

i—l  N 

Ci  =  sup  inf  y  Pij  (ui,Vi)Cj  +  y  Pij  (Ui,  Vi) Cj  +  h , 

Vi  “ 

o=i  0=1 

(2.7) 

by  successive  substitutions,  we  can  write  (2.1)  in  the 
equivalent  form 

C  =  sup  inf  Q(u,  v)C  +  K(u,  v)  =  Q(u,v)C  +  K(u,v). 
v  «  L  J 

(2.8) 

Similarly,  with  un,vn  realizing  (2.3),  the  following  is 
equivalent  to  (2.3): 


Vi)  , 


< 


\Q{u,vn)Cn  +  K{u,vn) 
Q(u,  vn)  (Cn  —  C)  . 


lierating  yields 


Q(u,vn)C  +  K(u,vn) 

(2.13) 


[Q(un,v)---Q(u1,v)]  (C1  —  C) 

<  Cn+1  -  C  <  [Q(u, /)■••  Q{u, u1)]  ( C 1  -  C)  . 

(2.14) 

For  the  discounted  problem  the  row  sums  of  Q(u,v) 
satisfy  JT  (lij  (Mi v)  <  P  f°r  *  and  controls.  Hence 
Cn  — >  C  as  n  —¥  oo.  ■ 


Cn+1  =  sup  inf 


Q(u,v)Cn  +K{u,v) 


=  Q(un,vn)Cn  +K{un,vn). 


Stopping  when  hitting  a  boundary  set.  Now,  we 
(2.9)  allow  p  e  (0, 1],  so  that  the  discounting  can  be  dropped  if 
desired.  Suppose  that  the  process  stops  when  a  boundary 
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set  is  hit  and  that  the  mean  time  to  reach  the  boundary 
set  is  bounded,  uniformly  in  the  controls  and  initial  con¬ 
dition.  Thus  we  can  suppose  that  the  boundary  set  is 
absorbing  and  has  zero  cost.  Without  loss  of  general¬ 
ity,  let  0  denote  the  boundary  state.  Let  P(u,v )  still 
denote  the  matrix  of  transition  probabilities  among  the 
states  1, . . . ,  N  only.  With  C  =  {(7*;  1  <  i  <  N}  and 
Cn  =  {Cf;  1  <  *  <  N}  the  products  on  either  side  of 
(2.14)  go  to  zero.  Thus  Cn  — t  C.  It  is  preferable  if  state 
1  is  connected  to  the  boundary  and  the  states  are  or¬ 
dered  so  that  the  “mean  flow”  is  toward  the  boundary  as 
one  goes  from  the  lower  to  the  higher  numbered  states, 
where  possible.  The  transition  matrix  Q  associated  with 
such  an  ordering  has  a  faster  absorption  of  the  process  at 
the  boundary,  which  implies  a  faster  convergence  of  the 
algorithm  [7],  [12]  and  [11,  Chapter  6]. 

Optional  stopping  problems.  As  in  the  above  para¬ 
graph,  let  p  €  (0, 1].  Various  forms  of  optional  stop¬ 
ping  can  be  handled.  There  are  now  three  ways  that  the 
process  can  be  stopped.  One  is  by  hitting  a  predefined 
stopping  set,  denoted  by  state  0,  as  in  the  previous  para¬ 
graph.  Call  the  time  to-  Otherwise,  either  player  can 
decide  to  have  the  game  stopped.  The  associated  times 
are  called  n  for  player  i.  After  stopping  for  whatever  rea¬ 
son,  the  state  goes  to  absorbing  0,  with  zero  holding  cost 
there.  The  P(u,v)  represents  the  transition  probabilities 
only  among  the  states  1, . . . ,  N.  For  given  functions  gi{-), 
the  cost  is  now 


III.  The  Ergodic  Cost  Problem 

Now  p  =  1  and  P(u,v)  is  the  transition  matrix  for  a 
controlled  Markov  chain  which  is  ergodic  under  any  u,  v. 
We  adapt  the  procedure  of  [7,  ppl56-158],  originally  due 
to  White  [15].  Let  e  denote  the  IV-vector,  all  of  whose 
components  are  unity. 

A  Jacobi  procedure.  We  first  consider  the  analog  of 
the  simple  backwards  iteration  (Jacobi)  procedure  (2.2), 
whose  convergence  for  the  game  with  ergodic  payoffs  has 
not  been  proved  to  date  in  the  literature.  For  arbitrary 
W°,  define  the  vectors  Wn,wn  recursively  by 

r  =  sup,,  inf„  [P(u,  v)wn~ 1  +  K(u,  u)] 

to"  =  Wu  _  Wne,  \  ■  > 

where  jo  is  defined  above  (1.3).  There  is  a  value  for  the 
game  [4,  Section  5.2].  The  value  7  is  given  by 

IV  +  7e  =  sup  inf  [P(u,v)W  +  K(u,v)]  .  (3.2) 

V  U 

As  for  the  control  problem,  the  value  of  W  is  unique,  up 
to  the  addition  of  a  vector  with  constant  components,  and 
the  value  of  7  is  unique.  An  alternative  way  of  writing 
(3.2)  is  as 

W  =  sup,,  inf„  [P{u,  v)w  +  K(u,  v)] 

(3.2a) 

w  =  W  -  Wj0e.  y  ' 

Theorem  3.1.  wn  converges  to  the  value  7  of  the  game. 


Ci(u,v)  =  E\ 


min{To,Ti,T2}  —  1 

u,v  ^  ^ 

n— 0 


kxn  {uXn ,  vx„ ) 


Proof.  Recall  the  condition  (1.3)  and  the  definitions  of 
m  and  jo  there.  Let  un,vn  be  the  selected  controls  in 


.  fv  w  .  „u,v  (v  u  (3.1).  Define  cn  =  WTe.  Then,  for  any  u,v, 

i  9l  V^ri  J^{ti<T2,ti<to}  '  92\^T2  J^{T2<min{Ti,To}}  • 


(2.15) 

The  controls  1^,47  can  now  take  the  new  value  stop  as 
well  as  the  original  values  used  in  Theorem  2.1.  Let 
ki(ui,Vi )  >  e  >  0  for  all  values  other  than  the 

value  stop,  and  suppose  that  gi(-)  ^  but  g\ (i)  > 
<72(*)-  Extend  the  definition  of  the  &,(•)  to  include  the 
control  value  stop,  by  writing  fc;(stop,  17)  =  g±(i)  and  let 
fc*(ui,stop)  =  32(0  if  Ui  stop  and  let  it  be  zero  other¬ 
wise.  Then  the  Gauss-Seidel  algorithm  can  be  written  as 
(2.3).  We  have  g^ip)  <  Cf  <  g\{i).  Similarly  (2.1)  holds 
and  g2(i)  <Ci<  gi(i). 

Let  (un,vn)  satisfy  (2.3)  and  (u,  v)  satisfy  (2.1).  Due 
to  the  positivity  of  ki(ui,Vi),  for  Ui  and/or  17  not  equal 
to  stop,  if  player  1  uses  un  and  player  2  uses  some 
vn  at  time  n,  then  P(un,  vn)  ■  ■  ■  P{u°,  v°)  — >  0  and 
Q(un,vn)  ■  ■  ■  Q(u°,v°)  — >  0  as  n  —>  00  uniformly  in  the 
{vn}  choices.  Analogously,  Q(u,  vn)  ■  ■  ■  Q(u,  v°)  —¥  0  for 
all  {un}  choices.  Using  these  facts  and  following  the 
logic  of  the  proof  of  Theorem  2.1  yields  the  convergence 
Cn  — >  C  for  this  problem. 


P(un,v)wn~1  +  K(un,v) 

<  Wn  =  P(un,  v^w*1-1  +  K(un,  vn)  (3.3) 

<  P(«,»>'*-1+lf(t1,»n). 

Let  (u,v)  =  (ura_1,  v"-”1)  in  (3.3)  and  use  the  definition 
of  wn  in  (3.1)  to  get 

P(un,  u"_1)w"_1  +  K{un,  u™-1)  -  cn 

<Wn  <  P(tl"-1,»>'*-1+lf(M"-1,t)B)-CB. 

Replacing  n  with  n  —  1  in  (3.3)  and  letting  (u,v)  = 
(■ un,vn )  yields,  for  i  <  N, 

P(un-\vn)wn-2  +  iL(u"_1,  un)  -  cn_i 

<  to n~1  <  P (un  ,vn~1)wn~2  +  K (un ,  vn~ 1 )  -  cn- 1. 
The  last  two  inequalities  yield 
P(u n,Vn~1)  ( W n“1  -  Wn~2)  -  (Cn  cn- 1)  <  to"  -  K)"-1 
<  P(«"-y)  (to”-1  -  wn~2)  -  (cn  -  cn _r) . 


(3.4) 
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Iterating  (3.4)  m  —  1  times  leads  to 
P{un,vn~1)  •  •  ■  P{un-m+1  ,vn-m)(wn-m  -  w”-™-1) 

-(Cn  -  Cn—m)  <  W™  -  (I)""1 

<  P(n"_1,un)  •  •  ■  P(un~m ,vn-m+1)(wn-m  -  wn~m~l ) 

(t-n  On— m)- 

(3-5) 

Define  die71  =  wn  —  wn  1  =  i  <  IV}.  Then  the  right 
hand  inequality  of  (3.5)  yields,  for  i  <  N, 


5w?< 


J2pi 

-[WS  -MS””]- 


(3.6) 

Since  wfo  =  0  for  all  n,  we  have  dte”o  =  0.  This,  with 
(3.6)  and  (from  (1.3)) 

P(S («n_1» «n_2,  «n_1;  •  •  • ;  «”_m, un"m+1)  >  e  >  o 

for  all  i,  n,  and  controls,  yields  for  *  <  IV 


max  Swf  <  (1  -  e)  ma xSwJ~m  -  \W?  -  WJ~m]  . 

1  J 

Analogously,  using  the  fact  that  min,  wf  <  0  and  that 

Pijo  (un,  un_1,  u"-2;  •  •  • ;  un~m+1 ,  vn~m)  >  e 

for  all  i,n,  the  left  hand  inequality  of  (3.5)  yields,  for 
i  <  N, 


min  5w?  >  (1  -  e)  min  SwJ~m  -  [W£  -  W£"TO]  . 

1  J 

Hence,  for  all  i,  where  we  define  [max*  —  min^a^  = 
max*  a*  —  min*  a* , 


max  —  mm 

i  i  J 


Su>i  <  (1  -  e) 


max  —  mm 


Sw7 


(3.7) 


which  implies  that  wn  converges  to,  say,  w.  Hence  Wn 
converges  to,  say,  W .  and  the  limits  satisfy  (3.2a).  Hence, 
(3.2)  holds  with  7  =  Wj0 .  ■ 


The  condition  (1.3)  is  no  longer  sufficient  for  convergence. 
For  arbitrary  controls  { un,vn },  let  q[™\un,  vn\  ■  ■  ■ ;  u1,  v1) 
denote  the  i,jth  element  of  Q(un,  vn)  ■  ■  •  (^(u1,  v1)-  We 
now  require  the  additional  condition  that  there  are  e  > 
0,  jo,  and  an  integer  m,  such  that  for  all  controls  {un,  vn} 
and  all  i, 

q^(um,  vm;  Um~\  t)"*-1;*  •  • ;  u\  V1)  >  e  >  0.  (3.10) 

The  condition  is  discussed  below  the  theorem. 

Discussion  of  (3.10).  Consider  a  one  dimensional  re¬ 
flected  diffusion  on  the  finite  interval  [ A ,  B\ ,  B  >  A,  and 
let  the  variance  be  strictly  positive.  Approximate  this  by 
an  IV-dimensional  Markov  chain  via  the  methods  of  [11]. 
The  reflecting  states  are  1  and  IV,  which  correspond  to  A 
and  B,  resp.  If  the  discretization  interval  is  small  enough, 
then  each  state  communicates  with  its  immediate  neigh¬ 
bors  only,  with  probabilities  that  are  bounded  away  from 
zero,  uniformly  in  the  controls.  Then  infMjUj  972  (w,  v)  >  0 
and  we  can  use  any  m  >  1  and  j0  =  2  in  (3.10).  This  is 
a  consequence  of  the  form  of  the  Gauss-Seidel  iteration, 
which  connects  states  to  those  that  are  lower  in  the  order 
of  the  iteration.  An  analogous  result  holds  for  the  mul¬ 
tidimensional  case,  if  the  diffusion  being  approximated  is 
non-degenerate.  See  [11]  for  details  concerning  the  ap¬ 
proximation,  which  is  the  same  for  the  game  problem. 

Theorem  3.2.  wn  converges  to  the  value  7  of  the  game. 

Proof.  The  proof  is  just  an  adaptation  of  that  of  Theo¬ 
rem  3.1,  analogously  to  the  way  that  the  proof  of  Theo¬ 
rem  2.1  is  an  adaptation  of  the  proof  of  the  convergence 
of  the  classical  procedure  (2.2)  of  value  iteration  for  the 
discounted  cost  problem.  Let  un,vn  be  the  selected  val¬ 
ues  in  (3.8)  or  (3.9).  Then  the  inequalities  (3.3)  hold 
with  ( Q,K )  replacing  (. P,K ).  Analogously  to  the  devel¬ 
opment  in  Theorem  3.1,  this  and  (3.10)  imply  (3.7)  and 
the  theorem.  ■ 


The  Gauss-Seidel  procedure.  The  Gauss-Seidel  form 
of  (3.1)  is,  in  order  i  =  1,2, , 


WP  =  sup  inf 

Vi  “« 

N 


i—  1 


X>i(^)w? 

3= 1 


T  ^  (  Pij  (Ui .  V; ) 


J=* 


W™-1  -  IT”-1 

3  30 


+  ki(ui,Vi) 


(3.8) 

Recall  the  definition  of  Q(u,v )  and  K(u,v)  from  Section 
2.  Then,  in  matrix  notation,  (3.8)  can  be  written  as 


Wn  =  sup.(,  inf,, 


Q(u,v)wn  1  +  K(u,v) 


wn  =  Wn  _  wnG' 


(3.9) 


IV.  Conclusions. 

For  solving  optimization  problems  for  control  and 
games  for  finite-state  Markov  chain  models  via  value  it¬ 
eration,  the  Gauss-Seidel  method  is  faster  than  the  Ja¬ 
cobi  procedure.  The  proof  of  convergence  for  the  control 
problem  is  well  known,  but  was  not  available  for  the  game 
problem.  For  the  problem  of  games,  it  is  shown  that  the 
Gauss-Seidel  procedure  converges  for  the  discounted,  op¬ 
timal  stopping,  and  ergodic  cost  criteria. 
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